高性能版Spark全密态计算引擎性能测试报告

更新时间:

本文介绍高性能版Spark全密态计算引擎与AnalyticDB for MySQL Spark的性能测试流程及对比结果,通过对比可知,在处理大规模数据集时,高性能版Spark全密态计算引擎相较于AnalyticDB for MySQL Spark 3.2.0版本的性能提升了0.9倍,具体性能对比详情请参见本文。

重要

本测试并未完全依照TPC-H基准测试规范,而是基于该测试规范进行了修改。本测试结果不能等同于完全遵守TPC-H测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。

前提条件

  • 集群的产品系列为企业版、基础版或湖仓版

  • 集群与OSS存储空间位于相同地域。

  • 已创建Job型资源组。具体操作,请参见新建资源组

  • 已创建数据库账号。

  • 已为RAM用户授予AliyunADBFullAccess、AliyunADBSparkProcessingDataRoleAnalyticDB MySQL库表的读写权限。具体操作,请参见账号授权

测试环境

配置项

说明

备注

地域及可用区

华东1(杭州)可用区K

不涉及。

网络类型

专有网络VPC

不涉及。

产品系列

湖仓版集群

不涉及。

Spark资源规格

2xlarge(16 core 64 GB,8台)

不涉及。

执行引擎

  • AnalyticDB for MySQL Spark 3.2.0版本

  • 高性能版Spark全密态计算引擎

对比AnalyticDB for MySQL Spark和高性能版Spark全密态计算引擎的性能。

测试数据量

明文数据:1TB

密文数据:1TB

明文数据与密文数据的文件类型均为Parquet。

存储

OSS存储空间

不涉及。

测试数据

本文示例以TPC-H数据集为例。

  1. TPC官网下载TPC-H标准的数据生成工具DBGEN。

  2. 构造测试数据。更多信息,请参见构建数据

    ./dbgen -s $scale -C $chunks -S $i -f

    TPC-H测试数据集包含customerlineitemnationorderspartpartsuppregionsupplier八张表。

  3. 将上述步骤生成的八张表转换为Parquet格式。本文以customer为例,示例如下:

      val df = spark.read
      .format("csv")
      .option("header", "false")
      .option("delimiter", "|")
      .schema(StructType {
        StructField("c_custkey", IntegerType, true) ::
          StructField("c_name", StringType, true) ::
          StructField("c_address", StringType, true) ::
          StructField("c_nationkey", IntegerType, true) ::
          StructField("c_phone", StringType, true) ::
          StructField("c_acctbal", FloatType, true) ::
          StructField("c_mktsegment", StringType, true) ::
          StructField("c_comment", StringType, true) ::
          Nil
      })
      .load("customer.tbl")
      df.write.parquet("parquet/customer")
    说明

    本文仅展示将customer表转换为Parquet格式的示例语句,其他表与本示例相似。

  4. TPC-H测试数据集上传至OSS存储空间。本文示例为:oss://testBucketName/adb/Spark

测试场景一:使用AnalyticDB for MySQL Spark处理明文数据

  1. 登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在企业版、基础版或湖仓版页签下,单击目标集群ID。

  2. 在左侧导航栏,单击作业开发 > SQL开发

  3. SQLConsole窗口,选择Spark引擎和Job型资源组。

  4. 创建数据库。

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    CREATE DATABASE adb_plain_db;
  5. 创建外表,用于存储明文数据。LOCATIONTPC-H测试数据集所在的OSS路径。本文以customer表为例,OSS路径为oss://testBucketName/adb/Spark/customer

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_plain_db; 
    CREATE TABLE IF NOT EXISTS customer 
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. 执行SQL计算,并记录SQL执行耗时。

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

测试场景二:使用高性能版Spark全密态计算引擎处理明文数据

  1. 登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在企业版、基础版或湖仓版页签下,单击目标集群ID。

  2. 在左侧导航栏,单击作业开发 > SQL开发

  3. SQLConsole窗口,选择Spark引擎和Job型资源组。

  4. 开启密态计算,并创建数据库。

    -- 开启密态计算
    SET spark.adb.native.enabled=true;
    -- 配置Spark资源规格
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    -- 创建数据库
    CREATE DATABASE adb_external_db;
  5. 创建OSS外表,用于存储明文数据。LOCATIONTPC-H测试数据集所在的OSS路径。本文以customer表为例,OSS路径为oss://testBucketName/adb/Spark/customer

    SET spark.adb.native.enabled=true;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_external_db;
    CREATE TABLE IF NOT EXISTS customer 
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. 执行SQL计算,并记录SQL执行耗时。

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

测试场景三:高性能版Spark全密态计算引擎处理密文数据

  1. 登录云原生数据仓库AnalyticDB MySQL控制台,在左上角选择集群所在地域。在左侧导航栏,单击集群列表,在企业版、基础版或湖仓版页签下,单击目标集群ID。

  2. 在左侧导航栏,单击作业开发 > SQL开发

  3. SQLConsole窗口,选择Spark引擎和Job型资源组。

  4. 开启密态计算,设置用户的主密钥,并创建数据库。

    -- 开启密态计算
    SET spark.adb.native.enabled=true;
    -- 开启密文读写支持并设置主密钥列表,KMS Client 以及 CryptoFactory(开启后引擎可同时支持明文和密文)
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    -- 配置Spark资源规格
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    --创建数据库
    CREATE database IF NOT EXISTS adb_encryption_db;
  5. 创建OSS外表,用于存储明文数据。LOCATIONTPC-H测试数据集所在的OSS路径。本文以customer_tmp表为例,OSS路径为oss://testBucketName/adb/Spark/customer

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_encryption_db;
    CREATE TABLE IF NOT EXISTS customer_tmp
     (
     c_custkey long,
     c_name string,
     c_address string,
     c_nationkey long,
     c_phone string,
     c_acctbal decimal(12, 2),
     c_mktsegment string,
     c_comment string
    )
    USING parquet 
    LOCATION 'oss://testBucketName/adb/Spark/customer';
  6. 创建OSS外表,用于存储密文数据。本文示例将customer外表的数据指定存储在oss://testBucketName/adb/Spark/enc_customer

    SET spark.adb.native.enabled=true;
    SET spark.hadoop.parquet.encryption.key.list=kf:MDEyMzQ1Njc4OTAxMjM0****,kc1:bvCDwqcOJGSdZSEMLjfk****,kc2:kflI/sq+uf50Qhl1MmtG****;
    SET spark.hadoop.parquet.encryption.kms.client.class=io.glutenproject.encryption.InMemoryKMS;
    SET spark.hadoop.parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    USE adb_encryption_db;
    CREATE TABLE IF NOT EXISTS customer
    USING Parquet
    OPTIONS (
     'parquet.encryption.column.keys'='kc1:c_name;kc2:c_phone',
     'parquet.encryption.footer.key'='kf'
    )
    LOCATION 'oss://testBucketName/adb/Spark/enc_customer'
    AS
    SELECT *
    FROM customer_tmp;
  7. 执行SQL计算,并记录SQL执行耗时。

    SET spark.driver.resourceSpec=2xlarge;
    SET spark.executor.instances=8;
    SET spark.executor.resourceSpec=2xlarge;
    SELECT
     l_returnflag,
     l_linestatus,
     sum(l_quantity) AS sum_qty,
     sum(l_extendedprice) AS sum_base_price,
     sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
     avg(l_quantity) AS avg_qty,
     avg(l_extendedprice) AS avg_price,
     avg(l_discount) AS avg_disc,
     count(*) AS count_order
    FROM
     lineitem
    WHERE
     l_shipdate <= date'1998-09-02' - interval 1 day
    GROUP BY
     l_returnflag,
     l_linestatus
    ORDER BY
     l_returnflag,
     l_linestatus;
    
    SELECT
     s_acctbal,
     s_name,
     n_name,
     p_partkey,
     p_mfgr,
     s_address,
     s_phone,
     s_comment
    FROM
     part,
     supplier,
     partsupp,
     nation,
     region
    WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND p_size = 15
     AND p_type LIKE '%BRASS'
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE'
     AND ps_supplycost = (
     SELECT
     min(ps_supplycost)
     FROM
     partsupp,
     supplier,
     nation,
     region
     WHERE
     p_partkey = ps_partkey
     AND s_suppkey = ps_suppkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'EUROPE')
    ORDER BY
     s_acctbal DESC,
     n_name,
     s_name,
     p_partkey
    LIMIT 100;
    
    
    SELECT
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     o_orderdate,
     o_shippriority
    FROM
     customer,
     orders,
     lineitem
    WHERE
     c_mktsegment = 'BUILDING'
     AND c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate < date'1995-03-15'
     AND l_shipdate > date'1995-03-15'
    GROUP BY
     l_orderkey,
     o_orderdate,
     o_shippriority
    ORDER BY
     revenue DESC,
     o_orderdate
    LIMIT 10;
    
    SELECT
     o_orderpriority,
     count(*) AS order_count
    FROM
     orders
    WHERE
     o_orderdate >= date'1993-07-01'
     AND o_orderdate < date'1993-07-01' + interval 3 month
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem
     WHERE
     l_orderkey = o_orderkey
     AND l_commitdate < l_receiptdate)
    GROUP BY
     o_orderpriority
    ORDER BY
     o_orderpriority;
    
    SELECT
     n_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND l_suppkey = s_suppkey
     AND c_nationkey = s_nationkey
     AND s_nationkey = n_nationkey
     AND n_regionkey = r_regionkey
     AND r_name = 'ASIA'
     AND o_orderdate >= date'1994-01-01'
     AND o_orderdate < date'1994-01-01' + interval 1 year
    GROUP BY
     n_name
    ORDER BY
     revenue DESC;
    
    SELECT
     sum(l_extendedprice * l_discount) AS revenue
    FROM
     lineitem
    WHERE
     l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year
     AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01
     AND l_quantity < 24;
    
    SELECT
     supp_nation,
     cust_nation,
     l_year,
     sum(volume) AS revenue
    FROM (
     SELECT
     n1.n_name AS supp_nation,
     n2.n_name AS cust_nation,
     extract(year FROM l_shipdate) AS l_year,
     l_extendedprice * (1 - l_discount) AS volume
     FROM
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2
     WHERE
     s_suppkey = l_suppkey
     AND o_orderkey = l_orderkey
     AND c_custkey = o_custkey
     AND s_nationkey = n1.n_nationkey
     AND c_nationkey = n2.n_nationkey
     AND ((n1.n_name = 'FRANCE'
     AND n2.n_name = 'GERMANY')
     OR (n1.n_name = 'GERMANY'
     AND n2.n_name = 'FRANCE'))
     AND l_shipdate BETWEEN date'1995-01-01' AND date'1996-12-31') AS shipping
    GROUP BY
     supp_nation,
     cust_nation,
     l_year
    ORDER BY
     supp_nation,
     cust_nation,
     l_year;
    
    SELECT
     o_year,
     sum(
     CASE WHEN nation = 'BRAZIL' THEN
     volume
     ELSE
     0
     END) / sum(volume) AS mkt_share
    FROM (
     SELECT
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) AS volume,
     n2.n_name AS nation
     FROM
     part,
     supplier,
     lineitem,
     orders,
     customer,
     nation n1,
     nation n2,
     region
     WHERE
     p_partkey = l_partkey
     AND s_suppkey = l_suppkey
     AND l_orderkey = o_orderkey
     AND o_custkey = c_custkey
     AND c_nationkey = n1.n_nationkey
     AND n1.n_regionkey = r_regionkey
     AND r_name = 'AMERICA'
     AND s_nationkey = n2.n_nationkey
     AND o_orderdate BETWEEN date'1995-01-01' AND date'1996-12-31'
     AND p_type = 'ECONOMY ANODIZED STEEL') AS all_nations
    GROUP BY
     o_year
    ORDER BY
     o_year;
    
    SELECT
     nation,
     o_year,
     sum(amount) AS sum_profit
    FROM (
     SELECT
     n_name AS nation,
     extract(year FROM o_orderdate) AS o_year,
     l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
     FROM
     part,
     supplier,
     lineitem,
     partsupp,
     orders,
     nation
     WHERE
     s_suppkey = l_suppkey
     AND ps_suppkey = l_suppkey
     AND ps_partkey = l_partkey
     AND p_partkey = l_partkey
     AND o_orderkey = l_orderkey
     AND s_nationkey = n_nationkey
     AND p_name LIKE '%green%') AS profit
    GROUP BY
     nation,
     o_year
    ORDER BY
     nation,
     o_year DESC;
    
    SELECT
     c_custkey,
     c_name,
     sum(l_extendedprice * (1 - l_discount)) AS revenue,
     c_acctbal,
     n_name,
     c_address,
     c_phone,
     c_comment
    FROM
     customer,
     orders,
     lineitem,
     nation
    WHERE
     c_custkey = o_custkey
     AND l_orderkey = o_orderkey
     AND o_orderdate >= date'1993-10-01' AND o_orderdate < date'1993-10-01' + interval 3 month
     AND l_returnflag = 'R'
     AND c_nationkey = n_nationkey
    GROUP BY
     c_custkey,
     c_name,
     c_acctbal,
     c_phone,
     n_name,
     c_address,
     c_comment
    ORDER BY
     revenue DESC
    LIMIT 20;
    
    SELECT
     ps_partkey,
     sum(ps_supplycost * ps_availqty) AS value
    FROM
     partsupp,
     supplier,
     nation
    WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY'
    GROUP BY
     ps_partkey
    HAVING
     sum(ps_supplycost * ps_availqty) > (
     SELECT
     sum(ps_supplycost * ps_availqty) * 0.0001000000
     FROM
     partsupp,
     supplier,
     nation
     WHERE
     ps_suppkey = s_suppkey
     AND s_nationkey = n_nationkey
     AND n_name = 'GERMANY')
    ORDER BY
     value DESC;
    
    SELECT
     l_shipmode,
     sum(
     CASE WHEN o_orderpriority = '1-URGENT'
     OR o_orderpriority = '2-HIGH' THEN
     1
     ELSE
     0
     END) AS high_line_count,
     sum(
     CASE WHEN o_orderpriority <> '1-URGENT'
     AND o_orderpriority <> '2-HIGH' THEN
     1
     ELSE
     0
     END) AS low_line_count
    FROM
     orders,
     lineitem
    WHERE
     o_orderkey = l_orderkey
     AND l_shipmode IN ('MAIL', 'SHIP')
     AND l_commitdate < l_receiptdate
     AND l_shipdate < l_commitdate
     AND l_receiptdate >= date'1994-01-01' AND l_receiptdate < date'1994-01-01' + interval 1 year
    GROUP BY
     l_shipmode
    ORDER BY
     l_shipmode;
    
    SELECT
     c_count,
     count(*) AS custdist
    FROM (
     SELECT
     c_custkey,
     count(o_orderkey)
     FROM
     customer
     LEFT OUTER JOIN orders ON c_custkey = o_custkey
     AND o_comment NOT LIKE '%special%requests%'
    GROUP BY
     c_custkey) AS c_orders (c_custkey,
     c_count)
    GROUP BY
     c_count
    ORDER BY
     custdist DESC,
     c_count DESC;
    
    SELECT
     100.00 * sum(
     CASE WHEN p_type LIKE 'PROMO%' THEN
     l_extendedprice * (1 - l_discount)
     ELSE
     0
     END) / sum(l_extendedprice * (1 - l_discount)) AS promo_revenue
    FROM
     lineitem,
     part
    WHERE
     l_partkey = p_partkey
     AND l_shipdate >= date'1995-09-01'
     AND l_shipdate < date'1995-09-01' + interval 1 month;
    
    
    SELECT
     s_suppkey,
     s_name,
     s_address,
     s_phone,
     total_revenue
    FROM
     supplier,
     (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue0
    WHERE
     s_suppkey = supplier_no
     AND total_revenue = (
     SELECT
     max(total_revenue)
     FROM (
     SELECT
     l_suppkey AS supplier_no,
     sum(l_extendedprice * (1 - l_discount)) AS total_revenue
     FROM
     lineitem
     WHERE
     l_shipdate >= date'1996-01-01' AND l_shipdate < date'1996-01-01' + interval 3 month
     GROUP BY
     supplier_no) revenue1)
    ORDER BY
     s_suppkey;
    
    SELECT
     p_brand,
     p_type,
     p_size,
     count(DISTINCT ps_suppkey) AS supplier_cnt
    FROM
     partsupp,
     part
    WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND p_type NOT LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND ps_suppkey NOT IN (
     SELECT
     s_suppkey
     FROM
     supplier
     WHERE
     s_comment LIKE '%Customer%Complaints%')
    GROUP BY
     p_brand,
     p_type,
     p_size
    ORDER BY
     supplier_cnt DESC,
     p_brand,
     p_type,
     p_size;
    
    SELECT
     sum(l_extendedprice) / 7.0 AS avg_yearly
    FROM
     lineitem,
     part
    WHERE
     p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container = 'MED BOX'
     AND l_quantity < (
     SELECT
     0.2 * avg(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = p_partkey);
    
    SELECT
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice,
     sum(l_quantity)
    FROM
     customer,
     orders,
     lineitem
    WHERE
     o_orderkey IN (
     SELECT
     l_orderkey
     FROM
     lineitem
     GROUP BY
     l_orderkey
     HAVING
     sum(l_quantity) > 300)
     AND c_custkey = o_custkey
     AND o_orderkey = l_orderkey
    GROUP BY
     c_name,
     c_custkey,
     o_orderkey,
     o_orderdate,
     o_totalprice
    ORDER BY
     o_totalprice DESC,
     o_orderdate
    LIMIT 100;
    
    SELECT
     sum(l_extendedprice * (1 - l_discount)) AS revenue
    FROM
     lineitem,
     part
    WHERE (p_partkey = l_partkey
     AND p_brand = 'Brand#12'
     AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
     AND l_quantity >= 1
     AND l_quantity <= 1 + 10
     AND p_size BETWEEN 1 AND 5
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#23'
     AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
     AND l_quantity >= 10
     AND l_quantity <= 10 + 10
     AND p_size BETWEEN 1 AND 10
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON')
     OR (p_partkey = l_partkey
     AND p_brand = 'Brand#34'
     AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
     AND l_quantity >= 20
     AND l_quantity <= 20 + 10
     AND p_size BETWEEN 1 AND 15
     AND l_shipmode IN ('AIR', 'AIR REG')
     AND l_shipinstruct = 'DELIVER IN PERSON');
    
    SELECT
     s_name,
     s_address
    FROM
     supplier,
     nation
    WHERE
     s_suppkey IN (
     SELECT
     ps_suppkey
     FROM
     partsupp
     WHERE
     ps_partkey IN (
     SELECT
     p_partkey
     FROM
     part
     WHERE
     p_name LIKE 'forest%')
     AND ps_availqty > (
     SELECT
     0.5 * sum(l_quantity)
     FROM
     lineitem
     WHERE
     l_partkey = ps_partkey
     AND l_suppkey = ps_suppkey
     AND l_shipdate >= date'1994-01-01'
     AND l_shipdate < date'1994-01-01' + interval 1 year))
     AND s_nationkey = n_nationkey
     AND n_name = 'CANADA'
     ORDER BY
     s_name;
    
    SELECT
     s_name,
     count(*) AS numwait
    FROM
     supplier,
     lineitem l1,
     orders,
     nation
    WHERE
     s_suppkey = l1.l_suppkey
     AND o_orderkey = l1.l_orderkey
     AND o_orderstatus = 'F'
     AND l1.l_receiptdate > l1.l_commitdate
     AND EXISTS (
     SELECT
     *
     FROM
     lineitem l2
     WHERE
     l2.l_orderkey = l1.l_orderkey
     AND l2.l_suppkey <> l1.l_suppkey)
     AND NOT EXISTS (
     SELECT
     *
     FROM
     lineitem l3
     WHERE
     l3.l_orderkey = l1.l_orderkey
     AND l3.l_suppkey <> l1.l_suppkey
     AND l3.l_receiptdate > l3.l_commitdate)
     AND s_nationkey = n_nationkey
     AND n_name = 'SAUDI ARABIA'
    GROUP BY
     s_name
    ORDER BY
     numwait DESC,
     s_name
    LIMIT 100;
    
    SELECT
     cntrycode,
     count(*) AS numcust,
     sum(c_acctbal) AS totacctbal
    FROM (
     SELECT
     substring(c_phone FROM 1 FOR 2) AS cntrycode,
     c_acctbal
     FROM
     customer
     WHERE
     substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17')
     AND c_acctbal > (
     SELECT
     avg(c_acctbal)
     FROM
     customer
     WHERE
     c_acctbal > 0.00
     AND substring(c_phone FROM 1 FOR 2) IN ('13', '31', '23', '29', '30', '18', '17'))
     AND NOT EXISTS (
     SELECT
     *
     FROM
     orders
     WHERE
     o_custkey = c_custkey)) AS custsale
    GROUP BY
     cntrycode
    ORDER BY
     cntrycode;
    

测试结果

各场景下的测试结果对比如下:

测试场景

SQL执行总耗时(分钟)

AnalyticDB for MySQL Spark 3.2.0版本处理明文数据

72

高性能版Spark全密态计算引擎处理明文数据

34

高性能版Spark全密态计算引擎处理密文数据

37

image

通过上述对比可以得知,在处理大规模数据集时,高性能版Spark全密态计算引擎相较于AnalyticDB for MySQL Spark 3.2.0版本性能提升了0.9倍。高性能版Spark全密态计算引擎在保证数据传输与存储过程安全的同时,也提升了数据处理效率。